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SUMMARY 


This report present the research results obtained from the research grant entitled 
“Active Control of Robot Manipulator Compliance,” funded by the Goddard Space Flight 
Center (NASA ) under the Grant Number NAG 5-780, for the period between August 1st, 
1989 and February 1st, 1990. 

In this report, uic present a trajectory control scheme whose design is based on learn- 
ing theory, for a six-degree- of -freedom (DOFj robot end-effector built to study robotic 
assembly of NASA hardwares in space. The control scheme consists of two control sys- 
tems: the feedback control system and the learning control system.. The feedback control 
system is designed using the concept of linearization about a selected operating point, and 
the method of pole placement so that the closed-loop linearized system is stabilized. The 
learning control scheme consisting of PD-type learning controllers, provides additional 
inputs to improve the end - effector performance after each trial. Experimental studies 
performed on a 2 DOF end-effector built at CUA, for 8 tracking cases show that actual 
trajectories approach desired trajectories as the number of trials increases. In fact, the 
tracking errors are substantially reduced only after 5 trials. 



1 INTRODUCTION 


Repeatable tasks in a factory or in space can be performed by a robot manipulator that 
is taught off-line via a so-called teaching and playback scheme or by a manipulator that 
is equipped with an on-line learning ability produced by a self learning control scheme 
without human supervision. Learning control theory which was originated from the con- 
cept suggesting that robot manipulators like human beings can learn from measurement 
data of previous operations in order to improve their performance in future operations, 
has attracted control researchers’ attention since many years [1] and recently has been 
considered for control of robot manipulators [2]- [10]. Control of mechanical arms using 
learning control theory was considered by Uchiyama [2] who proposed one of the first 
learning control schemes to be applied to robotics. Realizing that it is difficult to obtain 
full descriptions of robot manipulator dynamics due to their unknown characteristics 
such as friction, backlash, and non-rigidity, etc, Arimoto and his co-workers [3] de\ r cl- 
oped a so-called Betterment Process to provide manipulators with a learning ability. 
The betterment process is based on a simple iteration rule that generates a current 
actuator input which is better than the previous one under the condition that a desired 
output is specified. Applications of the betterment process to linear time-invariant sys- 
tems and to a class of nonlinear systems are presented in [4]. The concept of betterment 
process was further developed and applied into a learning-based position /force control 
scheme [5] which was experimentally shown to be very effective in polishing a curved 
object. Based on the principle of the betterment process, three types of learning control 
schemes were proposed by Arimoto and others in the work presented in [G] which also 
addressed the convergence problem of the proposed schemes. The synthesis of repetitive 
control systems for a subclass of systems whose outputs are controlled to follow periodic 
reference commands was considered by Hara and his co-workers [7]. Relaxing the rank 
condition imposed by Arimoto’s learning control scheme [3] and using state variable 
errors, Togai and Yamano [8] introduced a discrete learning algorithm to control dis- 
crete systems performing repetitive operations. They show that the discrete approach 
is generally more advantageous than the analog approach used in [3]-[6]. Based on ex- 
plicit modeling of robot manipulators and using inverse manipulator model, Atkeson 
and McIntyre [9] proposed a learning algorithm to reduce trajectory following errors of 
repetitive robot motions. Nguyen and others [10] combined the concepts of hybrid con- 
trol and learning control to design a learning-based hybrid control scheme for controlling 
force and position in part assembly problems. 

In this report, we consider the application of learning control theory into Cartesian 
trajectory control of a 6 DOF robot end-effector built at the Goddard Space Flight 
Center (NASA) to study telerobotic assembly of NASA hardwares [11]. In particular, 
a learning-based trajectory control scheme consisting of a feedback control system and 
a learning control system is presented. The feedback control system ensures that the 
linearized model of the closed-loop system is stable while the learning system provides 
additional inputs to the end-effector actuators so that, the responses can be improved 
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after each trial. Using the methods of linearization about a desired pattern and pole 
placement , wc will show that proper selection of the learning control system gains will 
provide the end-effector with an on-line learning ability so that it can autonomously 
reduce its errors as the number of trials increases. The performance of the developed 
learning control scheme will be investigated experimentally on a 2 DOF end-effector 
and investigation results will be discussed. 


2 THE ROBOT END-EFFECTOR 

Recently a G DOF end-effector was designed and built at NASA/Goddard Space Flight 
Center to serve as a testbed for studying the feasibility of autonomous assembly of 
parts in a telerobotic operation in space. As illustrated in Figure 1, the end-effector 
resembles the structure of a Stewart platform [13], and mainly consists of a payload 
platform, a base platform, six linear actuators and a gripper. The upper movable 
payload platform is coupled to the base platform by six axially extensible rods and 
recirculating ballscrews driven by dc motors are used to provide the extensibility. The 
motion of the upper payload platform is produced by the combination of extending 
and shortening the actuator lengths. Each end of the actuator links is mounted to 
the platforms by 2 rotary joints with intersecting and perpendicular axes. Solutions 
of forward and inverse kinematic problems and equations of motion of the above end- 
effector can be found in [1 1]- [12]. 

3 THE LEARNING CONTROL SCHEME 

Figure 2 presents the learning-based control scheme proposed to control the motion of 
the end-effector presented in previous section. The control scheme mainly consists of 
2 systems: the feedback control system and the learning control system. The feedback 
control system improves the end-effector dynamics in terms of system stability and 
tracking quality and the learning control system reduces the transient and steady-state 
errors of the end-effector responses after each trial. 

In the feedback control system, linear voltage differential transformers (LVDT) 
serving as position sensors are mounted along the end-effector actuators to measure 
their lengths l, for i=l,2,. ..,6, compactly represented by the joint position vector 1 
which is then compared with the desired joint position vector lj to generate the joint 
error vector l e . Since closed-form solutions exist for the inverse kinematic problem of 
a closed-kinematic chain mechanism [11], inverse kinematics is employed in the above 
control scheme to transform desired Cartesian position 1 vector Xd into the corresponding 
desired joint position corresponding 1 j. The joint errors will then serve as the inputs 
to the feedback controller whose gains are designed such that the end-effector tracks a 
set of desired Cartesian position trajectories with minimum settling time and minimum 

Un this report, Cartesian position implies both position and orientation. 
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steady-state errors. Here settling time is defined as the time the controller would need 
to track the actual response to reach about 5 percent of the deviation between the 
desired and actual time trajectories. Steady-state error denotes the constant deviation 
between the desired and actual spatial Cartesian paths after the response has settled. In 
[14], linearization and pole placement methods were employed to design the controller 
gains and satisfactory results were obtained. The tracking performance of the end- 
effector was further improved in [15] where the controller gains were designed using the 
concepts of model reference adaptive control and Lyapunov theory. Simulation results 
showed that although the responses were substantially improved in [15], there were 
still some minor difference between the desired and actual responses due to dynamic 
interferences caused by the nonlinearity of the end-effector dynamics. For tasks that are 
repetitive, the transient and steady-state responses can be further reduced by equipping 
the end-effector with a learning ability realized by a learning control system that can 
’’learn” from the joint position errors during a trial and provide additional signals to 
improve the end-effector performance during the next trial. 

Figure 3 illustrates the structure of the learning control system that mainly consists 
of a PD-type learning controller and a large-scale integrated random access memory 
(LSI RAM). The learning process is described in the following scheme: 

u*;+i = 11* + 3>(al e + /?i t ) (!) 

where u h denotes the output of the learning control system during the kth trial, $ is a 
positive definite matrix, a and / 3 are non-negative scalars and 

l e = I d - 1 (2) 

During the kth trial, the information of u k+1 is computed using (1) and stored in 
the lower part of the RAM as a set of densely sampled digital data. After the kth trial, 
the stored data will be loaded to the upper part of the memory and will be sent to the 
actuators during the (k+l)th trial. The lower part of the memory is now empty and 
ready to store new data. During the kth trial, the input to the end-effector actuators is 
composed of signals coming from the feedback controller, r p , the learning controller, u* 
and an auxiliary signal r a , namely 

r* = t p + u fc + r a . (3) 

The auxiliary signal r a is included in (3) to compensate the end-effector dynamics as 
seen later in the design of the controller gains. 

4 CONTROL SCHEME DESIGN 

Design of the proposed control scheme is performed in three steps: a) linearizing the 
end-effector dynamics about \ d , b) selecting the controller gains for the feedback control 
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system so that the linearized closed-loop control system is stable, and c) selecting the 
controller gains for the learning control system so that the difference between the desired 
and actual response converges to zero as the number of trials increases to infinity. 

The dynamical equations of the end-effector are given by [12] 


M(l)l(0 + N(l,l) + G(l) = r(0 

(4) 

where 1 denotes the (6x1) joint variable vector containing the length /, as its ith row, 
for i=l,2,...,6, M(l), N(l,i) and G(l) represent the (6x6) end-effector mass matrix, 
the (6x1) centrifugal and Coriolis force vector, and the (6x1) gravitational force vector, 
respectively. 

Linearizing (4) about 1 d , the desired joint variable vector which corresponds to the 
Cartesian variable vector x d by using Taylor series expansion and neglecting higher 
order terms, we obtain 

+ N(f)z(<) + G(f)z(f) + T d (t) = r(t) 

(5) 

where 

z (0 = 1(0 - uo 

(G) 

^3 

ir* 

It 

(7) 

N(0 = |[Nd,i)] Uj 

(8) 

G(() = ^[M(l)i J + N(l,i) + G(l J j u 

(9) 

Tj(«) = M(WW<) + N(lj,i d ) + G(l d ). 

(10) 

However we have 

r(0 = T p {t) + u(0 + T a (t) 

(11) 

where T p (t), the output of the PD controller of the feedback control system 

is given by 

r p (0 = Kpl e + K d i c 

(12) 

and K p and ~K d are the controller gain matrices of the PD controller. 
Now substituting (11)-(12) into (5) yields 


M(0*(0 + [N(0 + K d ] z(0 + [G(0 + Kp] z(0 = u(0 

(13) 

where we let 

T a (t) = T d (t ) 

(14) 

and it is noted that 

le(0 = -*(0- 

(15) 


4 



The system represented by (13) is a linear time- varying control system which can 
be asymptotically stabilized by properly selecting K p and Kj by using for instance the 
eigenvalue assignment method in [1 G] . 

Using (15), we proceed to rewrite (1) as 

u*+i(0 = u *(0 - * (»**(<) + P*k(t ) ] (16) 

where 

u k (t) = M (t)i k (t) + [N (t) + K d ] i k (t ) + [G(0 + K p ] »*(<). (17) 

We recall from (6) that z (t) denotes the error between the actual joint vector 1(/) 
and the desired joint vector 1 d (t). Therefore, to obtain good tracking quality, z d (t), the 
desired value for z (<), should be set to 0. In this case, (16) can be further rewritten as 

u* + i = u* + $ [o(z<i - zjt) + (3(id - z*)] ■ ( 18 ) 

Equations (17) and (18) constitute the general learning scheme of the Cartesian trajec- 
tory control scheme. 

We proceed to present the following lemma: 

Lemma 1 Consider a class of n- dimensional linear time-varying systems described by 

R (<)£(<) + QW60 + P(*)£(*) = *?(<) (19) 

where £(t) and r](t) are the (n x 1) controlled variable vector and the (n x \)input vector, 
respectively. R (t), Q (t) and P (t) denote ( n x n) time-varying matrices whose elements 
are continuously differentiable on [0,T] for some positive constant T andK(t) is positive 
definite for all t e [0,T]. A learning scheme is defined by 

*lk+\ =r] k +T [<*(&* - 6) + 0{id ~ 6)] , (20) 

R (06(0 + Q (06(0 + P(06(0 = r, k (t) (21) 

6(0) = 6(0) 6(0) = 6(0) (22) 

where a and 0 are non-negative constant scalars, 6 denotes the desired trajectory for 
6, and r is an (n x n) positive definite matrix. 

If t)i is continuous, 6 w continuously differentiable on [0,TJ, a and 0 are selected 
such that 0 < a, 0 < 1, then 6 converges to 6 uniformly on [0,T] as k — ► oo. 

The proof of Lemma 1 can be found in [5]. We now present the main result of this 
report. 

Main Result 1 Consider the robot end-effector whose dynamical equations and lin- 
earized model are given by (j) and (5), respectively. If the desired Cartesian trajectory 
vector x d is continuously differentiable on [0,T], and x*(0) = x<f(0); x^(0) = x d ( 0), 

then the learning control scheme described in (17) and (18) can be designed so that the 
difference between x k and x d converges to 0 as k — * oo. 
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Proof- In [12], wc showed that closed-form solutions exist for the inverse kinematic 
problem of closed-kinematic chain mechanism. Therefore from the fact that 


h = IK(x d ), 


(23) 


where IK denotes the inverse kinematic function of the end-effector, \lx d is con } 

differentiable on [0,T], then so is \ d . In this case, using (7)-(9), we observe that M( ), 
N(t) and G(t) are also continuously differentiable on [0,TJ. Consequently, the matrices 
in (17) are all continuously differentiable on [0,T]. In addition, 


i fc (0) = U(0) - i d (0) = 0 = i<*(0) 

zjt(O) = ljt(0) — 1^(0) = 0 = zj(0) 


(24) 

(25) 


“ U(0) = ij(0); WO) -WO), (20) 

which are derived from the hypothesis of Main Result. Besides M(t) is positive definite 
on [0,T] because so is M(l). The system represented by (17) is stabilized and conse- 
quently Ul (t) is continuous on [0,T], Now comparing (16) and 0 /) with (20) and (M), 
respectively, and applying Lemma 1, if we select o and /? m (18) such that < o , , 

then , 4 (t) converges to z,(0 = 0 uniformly on [0,T] « * - oo. In other words U(l) 
converges to 1 d (<), or equivalently X k (t) converges to x d (t) uniformly on [0,T] a. k 
The proof of the main result is completed. 

5 EXPERIMENTAL STUDY OF 2 DOF CASE 

In this section, the proposed learning-based control scheme is implemented to control ' 
motion of a 2 DOF end-effector showed in Figure 4. The end-effector mamly consists of 
2 ball-screw linear actuators driven by dc motors and hung below a stationary platform 
via pin joints. Position feedback is accomplished by 2 LVDT s mounted a ong the 
actuator links. The end-effector is controlled by a personal computer through a data 
acquisition system consisting of an IBM board, an adapter and a software package 
called Labtech Notebook. PD controllers, learning controller inverse kinematics, enor 
computation and joint force computation are implemented by Labtech Notebook. Based 
on the diagram given in Figure 5, the Cartesian position x and y expressed with .respect 
to a reference coordinate system affixed to the stationary platform are related to the 
joint positions lj and I 2 as follows: 


l , 2 -h 2 + (P 


2d 


(27) 


and 


x /Sv-(/ 1 2 - h 7 + <py 
y = 2d 


(28) 
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where d is the distance between the pin joints hanging the actuators. The Lagrangian 
approach is applied to derive the following equations of motion: 


r(t) = M(l, i) 1(0 + N(l, i) i(0 + G(l, i) (29) 


where 


T(t) = (r, t 2 ) t ; I = (I, I,) 1 


(30) 


where r, and /, denote the joint force to and the lenght. of the ith actuator for i=l,2, 
respectively. Also 

m, 0 

0 mj 


with 


and 


M 


Ci = 


Go — 


N = 


n tnilm(h-li) 

u 3u 

-h) Q 


3 u 


G = (G, G 2 ? 


-f h) - l 2 u 2 ] 

~ m \g[ 2 uil 7 (/] l m T 1 2 / m + 2/1/2) — hGiU 2 ) 

4dl 7 l 2 u 

~mgl m [2llu 2 (h + l 2 ) - l\U 2 ) 

— mi^[ 2 u 2 / 2 (/l/ m + h lm + 2/1/2) — him™ 2 ] 


4 dl 2 hu 

Ui = 1 2 — l 2 d 2 ; U 2 = l\ — l\ + d 2 ; u — \J 4d?l 2 — u 2 , 


(31) 


(32) 

(33) 

(34) 

(35) 

(36) 


where mi is the mass of the moving part of the link,m the total mass of the link, and 
/ m the fixed length of the actuators and g the gravitational acceleration. 

Experiments were performed to study the performance of the proposed learning 
control scheme implemented to track the end-effector on three different planar paths. 
The experimental results are reported below where in study case we let the end-effector 
repeat the task 5 times. 


Case 1: Tracking a Straight Line 

The straight line to be followed by the end-effector is specified by y = — 1.5r — CS [in cm] 
where x(t) = 0.6/ + 25.4 [in cm] and experimental results for this case are reported in 
Figures 6a-c. Figures 6a and 6b represent the time responses of the horizontal and 
vertical errors, respectively with respect to the desired trajectories, of the 1st and 5th 
trials while Figure 6c represents the actual and desired planar motions at the 5th trial. 
As the results show, the tracking performance was improved substantially at the 5th 
trial which brought the maximum horizontal error from 0.87 cm down to 0.35 cm and 
the maximum vertical error from 0.47 cm down to 0.20 cm. 
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Case 2: Tracking a Sinusoidal Path 

The sinusoidal path to be followed by the end-effector is described by y = sin(2x - 
50) - 83 [in cm] where x(t) = O.G/ + 25.4 [in cm]. Figures 7a-c report the experimental 
results for this case. Figures 7a and 7b illustrate the time responses of the horizontal 
and vertical errors, respectively, of the 1st and 5th trials while Figure 6c represents 
the actual and desired planar motions at the 5th trial. We observe that the tracking 
performance was improved significantly at the 5th trial which brought the maximum 
horizontal error from 0.84 cm down to 0.30 cm and the maximum vertical error from 
1.14 cm down to 0.5 cm. 


Case 3: Tracking a Circular Path 

Figure 8a-c present, the experimental results of tracking a circular path specified by 
(x — 34) 2 + (y + S3) 2 = 16 [in cm] where x(f) = 5sinf^t and y(i) = 5cos^jt. The 
time responses of the horizontal and vertical errors are presented in Figures 8a and 8b, 
respectively and Figure 8c represents the actual and desired planar motions at the 5th 
trial. As the results show, the tracking performance was improved substantially at the 
5th trial which brought the maximum horizontal error from 1.87 cm down to 0.52 cm 
and the maximum vertical error from 1.54 cm down to 0.71 cm. 

In the above experimental study, the following parameters were used: 


• End-Effector Parameters: d — 29 inches - , mj — 0.59 kg\ m = A.bkg 


• Feedback Control System: PD controller gains: 


K p 


22 ^ 0 

W . 

0 22 ^ 

in 


Kj 


0.7 


volt. sec 


o 


o 

Q 'J volt. sec 


• Learning Control System: 


a = 


20 ’ 


3 = 19 

P 20 


$ = 


220 0 
0 2 ’ 


6 CONCLUSION 

A learning-based control scheme was proposed in this report to control the Cartesian 
trajectory of a 6 DOF end-effector performing assembly tasks that arc repetitive. The 
learning control scheme consists of a feedback control system that improves the end- 
effector dynamics and a learning control system that can ’’learn” from errors to improve 
the end-effector performance after each trial. Linearization about a desired trajectory 
was applied to convert the nonlinear equations of motion of the end-effector into a 
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linear time-varying system which can be stabilized by properly selecting feedback PD 
controller gains by using eigenvalue assignment method. We then showed that the 
learning controller gains can be designed such that the end-effector motion approaches 
a desired motion in a repeatable assembly task, as the number of trials increases. Ex- 
perimental studies performed on a 2 DOF end-effector showed that the errors converged 
as the number of trials increased. In particular, the tracking performance of the end- 
effector was substantially improved only after 5 trials. Future research activities will 
be directed to the investigation of the proposed learning control scheme on the 6 DOF 
end-effector using computer simulation and experimentation. Attention should also be 
paid to the development of a learning- based control system which consists of an on-line 
adaptive feedback control system [15] and an off-line learning control system. Imple- 
mentation of a learning-based control scheme [10] to control position and force of the 6 
DOF end-effector should also be investigated experiment ally. 
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Figure 2: The Learning-Based Control Scheme 










Figure 3: The Learning Controller 
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Figure 4: The CUA 2 DOF End-Effector 
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Figure 6: Experimental Results of Tracking a Straight Line 
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Figure 7: Experimental Results of Tracking a Sinusoidal Planar Path 
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Figure 8: Experimental Results of Tracking a Circular Planar Path 





